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Abstract 



I The first two authors have shown [KK99[ IKKOO] that the sum the 

exponent (and thus the number) of maximal repetitions of exponent at 
least 2 (also called runs) is linear in the length of the word. The exponent 
2 in the definition of a run may seem arbitrary. In this paper, we consider 
maximal repetitions of exponent strictly greater than 1. 
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^ . 1 Introduction 

O , 

I/-) , Repetitions (periodicities) are fundamental concepts in word combinatorics 

^ ; [Lot83[ l(]K97[ IKKn5j . Recall that each word w is characterized by the minimal 

' period p{w) and by the exponent e{w) which is the ratio ^-j^- A great deal of 

^ ■ work in word combinatorics has been devoted to the study of words that do 

\ not contain subwords of a given exponent |CK97] . Another research direction, 

of more algorithmic nature, is the efficient identification of all subwords of a 
given exponent in a word [KK05j . which raises the combinatorial question of 
^ , the possible number of such subwords. 

5h \ In |KK99l IKKOOj , the first two authors considered the notion of maximal 

repetitions of a word, which are subword occurrences that cannot be extended 
outwards without changing their minimal period. They proved that the number 
of maximal repetitions of exponent at least 2 is linearly bounded in the length 
of word. It has been conjectured that this number is actually smaller than 
the word length. It has been also proved that not only the number of maximal 
repetitions of exponent 2 or more is linearly bounded, but the sum of exponents 
of these repetitions is linearly bounded too. The linear bound on the number 
of repetitions, in turn, allowed them to prove that all such maximal repetitions 
can be found in linear time. More recently, other researchers attempted to 
improve these results by finding a simpler proof of the linear bound implying a 
smaller multiplicative constant. The last current achievement in this direction 
is presented in |CIT08j . 
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A big question that remained open in this development concerns the lower 
bound of 2 on the exponent of considered repetitions. While this bound is 
intuitively natural (as it requires some subword to be consecutively repeated 
at least twice), it has no formal justification. Moreover, word combinatorics 
provides many separation results when the "right" bound on the exponent is 
not an "intuitive" number. For example, the famous Dejean's result states that 
the exponents that can be avoided on a ternary alphabet are exponents greater 
than I |Dej 72| . As another example, there are exponentially many binary words 
avoiding exponents greater than |, while there are only polynomially many of 
them avoiding smaller exponents |KS04| . 

In this paper, we completely lift the lower bound on the exponent and focus 
on the maximal repetitions of any exponent greater than 1. Note that repeti- 
tions with exponent between 1 and 2 are subwords of the form uvu that can 
be viewed as non- consecutive repetitions. Therefore, in this paper we consider 
both consecutive (periodicities) and non-consecutive repetitions. To the best of 
our knowledge, the number of repetitions of exponent smaller than 2 has not 
been studied. 

Instead of directly counting the repetitions or the sum of their exponents, 
we consider the sum of exponents decremented by 1. The main idea is that rep- 
etitions with exponents close to 1 (i.e. subwords uvu with \v\ ^ |m|) contribute 
to the sum with an amount close to 0. We prove that this sum is upper-bounded 
by n ln(n) (Theorem [T]) which immediately implies that the number of maxi- 
mal repetitions of any exponent greater than 1 -|- e is bounded by inln(n). 
On the other hand, the number of all maximal repetitions can be quadratic 
(Theorem [5]). We also obtain that the lower bound for the sum is ^ — 1, 
where k is the alphabet size, and we characterize the word achieving this lower 
bound (Theorem E]) . Finally, we study this sum for the words containing only 
repetitions with a period bounded by a constant. 

While the "whole picture" of the count of the number of maximal repetitions 
with exponent smaller than 2 is still incomplete, we believe that our results 
represent the first step in this direction. 

2 Definitions 

Recall that for any word vu, the (minimal) period, denoted p{w) is the minimal 
natural p such that w[i] = w[i +p] whenever positions i and i+ p both exist in 
w. The exponent of w is defined as e{w) = {\uj\ is the length of vu). A root 
of w is any subword of uj of length p{vu). The prefix (resp. suffix) root of w is 
the prefix (resp. suffix) of vu of length p{w). 

Given w, a maximal repetition in w is a subword uj[i..j\ such that p{w\i..j]) > 
p{w[i — l..j]) (provided that i / 1) and p{w[i..j]) > p{w[i..j + l]) (provided 
that j 7^ 1 1^1). Informally, "maximality" means that the subword is extended 
outwards as much as possible so long as its period is preserved. 

In this paper, we will be interested in maximal repetitions of any exponent 
greater than 1. The set of these subwords of vu will be denoted A4{vu). 

Note that any two occurrences of the same letter in vu define a maximal rep- 
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etition with a period that is a divisor of the distance between these occurrences. 
In this case, we wih speak about a maximal repetition defined by a letter match. 

3 Sum of decremented exponents 

For a word w, we will be interested in the sum of exponents of all maximal 
repetitions, decremented by 1: 

E (1) 

This quantity can be viewed as the difference between the sum of exponents of 
all maximal repetitions and the number of these repetitions. 

Theorem 1. For every word w of length n, we have ^r<^M(w) (^('^) ~ 1) ^ 
nln(n). 

Proof. For each maximal repetition r with period p, we distribute the value 
e(r) — 1 = ''^p ^ over (|r| — p) pairs of matching letters {w[i\,w[i +p]), w{i] = 
w\i + p] within the repetition. Each such pair contributes to the sum with 
weight |. Consider two positions i and j, 1 < « < j < re, in w. If = w[j\^ 
then this match participates in some repetition, but it is counted only if the 
period of this repetition is j — i, in which case it contributes to the sum with the 
amount i. We thus have EreA4(«;) i^^i^) " 1) ^ Ei<i<j<n = T.7=i V = 
n Y17=i 7 ~ ~ 1) — ^ ln(n) for n > 2. □ 

If we count only maximal repetitions of period at most p, then the following 
bound holds. 

Corollary 2. For every word w of length n, we have "YureMiw) p{r)<p (^('') ~ 1) ^ 
n(ln(p) + l). 

Proof. If only repetitions of period at most p are considered, then, according to 
the proof of Theorem[Tl the sum is bounded as follows. Y2reM{w) p{r)<p (^(^) ~ ^) ^ 

El<^<j<mm{i+p,n} jk ^ n{ln{p) + 1). □ 

Complementarily, if we count only maximal repetitions of period at least p, 
then we get 

Corollary 3. For every word w of length n, we have Yl,reM(w) p{r)>p (^(^) ~ 1) ^ 
n ln(n/p). 

Proof. Similar to Corollary [2j □ 

Assume now that we focus only on maximal repetitions of exponent (1 + e) 
or more, and we want to count their number. Theorem [1] immediately provides 
a nontrivial upper bound. 

Corollary 4. For every word w of length n and every e > 0, the number of 
maximal repetitions of exponent at least (1 + e) in w is at most jnln(n). 
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Proof. Consider the sum of Theorem [TJ Each repetition contributes at least e 
to it and therefore the number of those is at most □ 

Similarly, Corollaries [2] and [3] imply respective upper bounds ln(p) and 
-nln{n/p) on the number of maximal repetitions of exponent at least (1 + e) 
and of period respectively at most p and at least p. 

The following Theorem shows that the upper bound of Theorem[T]is asymp- 
totically tight within a factor of 8 and that the number of all repetitions of 
arbitrary exponent can be quadratic (to be contrasted with Corollary U]) . 

Theorem 5. Let w = (0011)"/^. Then 

(i) J2reM{w) (e(^) - 1) > |nln(n). 

(ii) the number of all maximal repetitions of w is Q{n'^), 

Proof, (i) The whole word w is an obvious repetition of period 4, its contribution 
to the sum is (n/4— 1). Any other repetition can be specified by a match between 
two O's or two I's that occur at a distance other than a multiple of 4. 

Consider a repetition r in which letter at some position m,m = l (mod 4), 
matches letter at a position i > m, i = 2 (mod 4). This match corresponds 
to end letters of the repetition, as w[m — 1] = 1 (if m 7^ 1) while — 1] =0, 
and w[i + 1] = 1 (if £ 7^ n) while w[m + 1] = 0. 

Furthermore, this repetition has period i — m = |r| — 1 and this period is 
minimal, as word w[m..i — 1] contains one more than I's and therefore the 
number of O's and the number of I's in w[m..i — 1] are mutually prime, which 
shows that w[m..i — 1] is primitive (i.e. not an integer power of some other 
word). 

Therefore, any two such positions m and i define a repetition that con- 
tributes l/{i—m) to the sum. In total, all such repetitions contribute Yl'i=i(.^ / 
i + l)/(4i - 3) > ^nln(n). 

There are three other symmetric cases: one corresponds to another way of 
matching two O's and the other two correspond to matching two I's. The four 
cases together yield Y.reM{w) i^i^) - 1) > (f - 1) +4^nln(n) > |nln(n). 

(ii) is obvious from the above, as the number of pairs of O's and pairs of I's 
defining repetitions is quadratic. □ 

We now focus on the lower bound for sum ([T]). In the rest of the paper, we 
assume that we have a /c- letter alphabet = {oi, 02, . . . , a^}. 

Theorem 6. For all w E (Afc)*, ^reM{w) (^('') ~ — I ~ ^ "'^^ equal- 
ity holds if and only if w = (0102 . . . Ofc) (modulo a permutation of alphabet 
letters). 

Proof. Given a word w G (^fc)*j consider all occurrences of a letter Oi € in 
w, and let d\,d\, . . . , d\. be the distances between all consecutive occurrences 
of ai'mw. Consider the sum 

Ci&Ak j=l i 
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Observe that J2reM{w) (^('") ~ 1) ^ Z^a^eAfc Sj=i J" since two consecutive oc- 
currences of ai necessarily participate in a repetition with period equal to the 
distance between these occurrences, and then contribute to sum ([TJ (see proof 
of Theorem [T]) . 

Therefore, if we construct a word that minimizes sum ([2]) and for which 
Er6Al(«<) i^i^) - 1) = Ea,eAfe EjLi jr , this will prove that this word also min- 
imizes sum ([1]). Our goal is to prove that this minimum is reached if and only 

n 

if for any letter a,, all dj = k, i.e. on words of the form w = (0102 . . . Ofc) * 
(modulo a permutation of alphabet letters). Clearly, for such words, sum ([1]) 
and sum are both equal to ^ — 1. 

\ — 

By contradiction, consider a word w that does not have the form (0102 . . . ajt) 
and assume that it minimizes sum ([2]). Then there exists a pair of positions 

< rrir such that w[m£\ = w[mr\ and rrir — mi < k. Among all such pairs, 
consider the one with minimal mr- 

Show that for any position m, k < m < mr, we must have wlm] = w[m — k]. 
This is because letter w[m] cannot repeat on the left at a distance smaller 
than k, as this would contradict the definition of m^. On the other hand, the 
closest occurrence of w[m] to the left cannot be at a distance larger than k 
either. Indeed, if w[m] = w[m'] for some m' < m and m — m' > k and there 
is no occurrence of w[m] in wlm' + l..m — 1], then subword wlm' + l..m' + k] 
is composed of A; — 1 letters and has length k, and therefore contains a letter 
repeated at a distance at most k — 1. This contradicts again the definition of 
mr- 

By the above, we can assume that wlL.mr — 1] = {ai..akyai..ai (up to a 
permutation of alphabet letters), q > 1, and ?x;[mr.] = aj for some j 7^ i' where 
i' = i + liii<k and i' = 1 i = k. Consider the closest position of Oi' to the 
right of mr, that we denote m' . (If such a position does not exist, the proof 
below will trivially apply.) 

We modify w by simultaneously 

• replacing all occurrences of aj at positions > by a^/ , and 

• replacing all occurrences of Oj' at positions > m' by Uj. 

We show that this modification makes sum ([2]) smaller. 

The only distances between consecutive occurrences of letters that will be 
affected by the modification of w are the distance mr — m^ between the corre- 
sponding occurrences of Oj and the distance m' — {mr — k) between the occur- 
rences of Oj'. The new distances become respectively k (between occurrences 
mr and — A; of Oj/) and m' — mi (between corresponding occurrences of Uj). 
We show that 

1 111 

mr — m£ m' — (mr — k) k m' — mi 

This will show that sum ([2]) becomes smaller after the modification. For this, 
we show that 

111 1 

mr — m£ k m' — mi m' — {mr — k) ' 
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k — rUr + rrii k — rrir + m£ 

(nir — m()k (m' — mi){m' — [m^. — k)) 

The numerators of both sides are equal. In denominator, we have m' — mi > 
rrir — nil and m' — {nir — k) > k, which proves the inequahty. 

We obtained a contradiction with the assumption that w minimizes sum 
([2]). This shows that a word that minimizes sum ([2]) must have the form w = 
(aia2 . . . a/c) (modulo a permutation of alphabet letters). On this word, sum 
([1]) and sum ([2]) are both equal f — 1- This proves that w also minimizes sum 
©. □ 



4 Words with repetitions of bounded period 

In this section, we study sum ([T]) in the case when all repetitions in w are of 
period at most p. Recall that k is the alphabet size. 

Theorem 7. Let the period of all repetitions of a word w (Iwl = n) be bounded 
by p. Then Y.reM{w) i^i^) + 3kp{ln{p) + 1). 

The proof will use the Fine and Wilf's theorem (see e.g. |Lot83j ) asserting 
that if w have (not necessarily minimal) periods pi and p2 and \w\ > Pi + P2 — 
gcd(pi,p2), then w has also the period gcd{pi,p2). This implies, in particular, 
that two different repetitions with minimal periods pi and p2 cannot intersect 
on (pi + P2) letters or more. 

Proof. Consider a word w such that the period of any repetition in w is bounded 
by p. 

Assume that for some letter a, two occurrences of a are located at a dis- 
tance 3p or more. Consider a repetition r defined by the match of these two 
occurrences of a. We will show that r has a very particular form, namely 

(a) all letters within a root of r are different, 

(b) any letter of r does not occur outside r. 




Figure 1: Proof of condition (a) of Theorem [7] 

First observe that since the period of r cannot exceed p, then the two occur- 
rences of a are separated by at least three periods p{r). To prove (a), assume 
that there is another occurrence of o in the suffix root of r (cf Figure [1]). Then, 
there is a repetition r' formed by matching this occurrence of a with the left 
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occurrence of a. These two occurrences are separated by 3p — p{r) > 2p letters. 
Consider p{r'). Since p{r') < p, there are at least 2p{r') letters between these 
two occurrences of a. This means that repetitions r and r' intersect by length 
at least 2 • max{p{r),p{r')} and by Fine and Wilf's theorem, r and r' must co- 
incide. This contradiction proves that a cannot have another occurrence within 
a root of r. More generally, the same argument shows that any letter occurs in 
a root only once. 

Condition (b) is proved by a similar argument. Assume that some letter b 
of r occurs outside r, for instance to the right of r. Then consider the match 
of this occurrence of b with the leftmost occurrence of b inside r. This match 
defines a repetition r' . Similar to part (a), r and r' intersect by length at least 
2 • ma,x{p{r),p{r')} and therefore must coincide by Fine and Wilf's theorem. 
This contradicts to the assumption that of an occurrence of b outside r and 
proves (b). 

Now, we split all repetitions into two disjoint classes: repetitions verifying 
conditions (a) and (b) and the others, called respectively repetitions of type 1 
and repetitions of type 2. By condition (b), for any word w, repetitions of type 
1 and type 2 in it; are non-intersecting. Furthermore, conditions (a) and (b) 
insure that two distinct repetitions of type 1 cannot intersect. Therefore, all 
repetitions of type 1 together cannot contribute more than n to the sum. 

On the other hand, repetitions of type 2 cannot take more than 3kp letters 
altogether in w, as each letter cannot occur more than 3p times as this would 
lead to a repetition of type 1 by the above reasoning. Therefore, by Corollary [21 
sum ([TJ for repetitions of type 2 is bounded by 3kp{lji{p) + 1). This gives the 
final bound n + 3kp{ln(p) + 1). □ 

Notice that the bound in Theorem[7]is optimal in some sense, since sum ([T| is 
n—1 for the word a"' and @ {kpln{p)) for the word {aiaia2a2)^^^ {a3a3a4a4y^^ . . ., 
according to Theorem [H 

5 Concluding remarks 

Many questions related to the combinatorics of repetitions of arbitrary exponent 
remain unanswered. A major such question is the precise bound on the number 
of such repetitions. Corollary [J] provides an 0(?7-logn) bound for the exponents 
at least (1 + s), for any fixed e > 0. It would be of great interest to refine 
this bound, possibly depending on e. It is not excluded that, possibly starting 
from some e > 0, or even for any fixed e > 0, the number of all repetitions of 
exponent at least (1 -|- e) is 0{n). This is a challenging question, that seems, 
however, difficult to solve, as it would generalize the result of [KK991 IKKOOj on 
the linear number of runs. 
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